49 research outputs found
Barrier Frank-Wolfe for Marginal Inference
We introduce a globally-convergent algorithm for optimizing the
tree-reweighted (TRW) variational objective over the marginal polytope. The
algorithm is based on the conditional gradient method (Frank-Wolfe) and moves
pseudomarginals within the marginal polytope through repeated maximum a
posteriori (MAP) calls. This modular structure enables us to leverage black-box
MAP solvers (both exact and approximate) for variational inference, and obtains
more accurate results than tree-reweighted algorithms that optimize over the
local consistency relaxation. Theoretically, we bound the sub-optimality for
the proposed algorithm despite the TRW objective having unbounded gradients at
the boundary of the marginal polytope. Empirically, we demonstrate the
increased quality of results found by tightening the relaxation over the
marginal polytope as well as the spanning tree polytope on synthetic and
real-world instances.Comment: 25 pages, 12 figures, To appear in Neural Information Processing
Systems (NIPS) 2015, Corrected reference and cleaned up bibliograph
OCDaf: Ordered Causal Discovery with Autoregressive Flows
We propose OCDaf, a novel order-based method for learning causal graphs from
observational data. We establish the identifiability of causal graphs within
multivariate heteroscedastic noise models, a generalization of additive noise
models that allow for non-constant noise variances. Drawing upon the structural
similarities between these models and affine autoregressive normalizing flows,
we introduce a continuous search algorithm to find causal structures. Our
experiments demonstrate state-of-the-art performance across the Sachs and
SynTReN benchmarks in Structural Hamming Distance (SHD) and Structural
Intervention Distance (SID). Furthermore, we validate our identifiability
theory across various parametric and nonparametric synthetic datasets and
showcase superior performance compared to existing baselines
HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding
There are several opportunities for automation in healthcare that can improve
clinician throughput. One such example is assistive tools to document diagnosis
codes when clinicians write notes. We study the automation of medical code
prediction using curriculum learning, which is a training strategy for machine
learning models that gradually increases the hardness of the learning tasks
from easy to difficult. One of the challenges in curriculum learning is the
design of curricula -- i.e., in the sequential design of tasks that gradually
increase in difficulty. We propose Hierarchical Curriculum Learning (HiCu), an
algorithm that uses graph structure in the space of outputs to design curricula
for multi-label classification. We create curricula for multi-label
classification models that predict ICD diagnosis and procedure codes from
natural language descriptions of patients. By leveraging the hierarchy of ICD
codes, which groups diagnosis codes based on various organ systems in the human
body, we find that our proposed curricula improve the generalization of neural
network-based predictive models across recurrent, convolutional, and
transformer-based architectures. Our code is available at
https://github.com/wren93/HiCu-ICD.Comment: To appear at Machine Learning for Healthcare Conference (MLHC2022
Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding
We present Clinical Camel, an open large language model (LLM) explicitly
tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical
Camel achieves state-of-the-art performance across medical benchmarks among
openly available medical LLMs. Leveraging efficient single-GPU training,
Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed
benchmarks, including 64.3% on the USMLE Sample Exam (compared to 58.5% for
GPT-3.5), 77.9% on PubMedQA (compared to 60.2%), 60.7% on MedQA (compared to
53.6%), and 54.2% on MedMCQA (compared to 51.0%). In addition to these
benchmarks, Clinical Camel demonstrates its broader capabilities, such as
synthesizing plausible clinical notes. This work introduces dialogue-based
knowledge encoding, a novel method to synthesize conversational data from dense
medical texts. While benchmark results are encouraging, extensive and rigorous
human evaluation across diverse clinical scenarios is imperative to ascertain
safety before implementation. By openly sharing Clinical Camel, we hope to
foster transparent and collaborative research, working towards the safe
integration of LLMs within the healthcare domain. Significant challenges
concerning reliability, bias, and the potential for outdated knowledge persist.
Nonetheless, the transparency provided by an open approach reinforces the
scientific rigor essential for future clinical applications.Comment: for model weights, see https://huggingface.co/wanglab
Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models
We study prediction of future outcomes with supervised models that use
privileged information during learning. The privileged information comprises
samples of time series observed between the baseline time of prediction and the
future outcome; this information is only available at training time which
differs from the traditional supervised learning. Our question is when using
this privileged data leads to more sample-efficient learning of models that use
only baseline data for predictions at test time. We give an algorithm for this
setting and prove that when the time series are drawn from a non-stationary
Gaussian-linear dynamical system of fixed horizon, learning with privileged
information is more efficient than learning without it. On synthetic data, we
test the limits of our algorithm and theory, both when our assumptions hold and
when they are violated. On three diverse real-world datasets, we show that our
approach is generally preferable to classical learning, particularly when data
is scarce. Finally, we relate our estimator to a distillation approach both
theoretically and empirically
DuETT: Dual Event Time Transformer for Electronic Health Records
Electronic health records (EHRs) recorded in hospital settings typically
contain a wide range of numeric time series data that is characterized by high
sparsity and irregular observations. Effective modelling for such data must
exploit its time series nature, the semantic relationship between different
types of observations, and information in the sparsity structure of the data.
Self-supervised Transformers have shown outstanding performance in a variety of
structured tasks in NLP and computer vision. But multivariate time series data
contains structured relationships over two dimensions: time and recorded event
type, and straightforward applications of Transformers to time series data do
not leverage this distinct structure. The quadratic scaling of self-attention
layers can also significantly limit the input sequence length without
appropriate input engineering. We introduce the DuETT architecture, an
extension of Transformers designed to attend over both time and event type
dimensions, yielding robust representations from EHR data. DuETT uses an
aggregated input where sparse time series are transformed into a regular
sequence with fixed length; this lowers the computational complexity relative
to previous EHR Transformer models and, more importantly, enables the use of
larger and deeper neural networks. When trained with self-supervised prediction
tasks, that provide rich and informative signals for model pre-training, our
model outperforms state-of-the-art deep learning models on multiple downstream
tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.Comment: Accepted at MLHC 2023, camera-ready versio